Apparatus and method for performing RC4 ciphering

ABSTRACT

An arrangement is provided for performing RC 4  ciphering. The arrangement includes apparatuses and methods that pipeline generation of a key stream based on a byte state array, called the S-box, which is initially generated from a secret key shared by a receiver and a transmitter in a network system. The S-box is stored in a storage device which may be a register file with two read ports and one write port. A cache is used to store a number of bytes read from the S-box storage device.

RESERVATION OF COPYRIGHT

A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.

BACKGROUND

1. Field

The present disclosure relates generally to network security and, more specifically, to apparatuses and methods for performing RC4 ciphering.

2. Description

Networks enable computers and other devices to communicate. For example, networks can carry data representing video, audio, e-mail, and so forth. However, network systems are subject to many threats, including loss of privacy, loss of data integrity, identity spoofing, and denial-of-service attacks. To address these threats, many measures have been developed and employed to improve the security of network communications. For example, a Rivest Cipher 4 (RC4) algorithm is selected by the Wired Equivalent Privacy (WEP), part of the IEEE 802.11 standard, to secure Wireless Fidelity (“WiFi”) networks, and by the Secure Sockets Layer (SSL) communications protocol to improve the security of communications on the Internet.

The RC4 algorithm is a symmetric key stream cipher algorithm. A symmetric key algorithm is an algorithm for cryptography that uses the same cryptographic key to encrypt and decrypt the message. Symmetric key algorithms can be divided into stream ciphers and block ciphers. Stream ciphers encrypt the bits of the message one at a time, and block ciphers take a number of bits and encrypt them as a single unit. The RC4 ciphering process operates as a pseudo-random number generator initialized from a secret key of up to 256 bytes. The RC4 ciphering process generates a series of bytes, called a key stream. Input text data (“plain text”) is encrypted by performing an exclusive-or (“XOR”) operation between the plain text and the key stream. The result of the XOR operation is a cipher text corresponding to the input text data. Decryption is performed by producing the same key stream and XORing it with the cipher text to reproduce the plain text. If the RC4 ciphering process is implemented in hardware, it may be more desirable to use less complex hardware components than more complex hardware components because less complex components may be more commonly available. Also in a hardware implementation, smaller die area translates to lower costs, higher yields, and often lower power, which are beneficial to network communications.

BRIEF DESCRIPTION OF THE DRAWINGS

The features and advantages of the present disclosure will become apparent from the following detailed description of the present disclosure in which:

FIG. 1 is a diagram illustrating a general network system;

FIG. 2 shows a pseudo code illustrating how the RC4 ciphering process encrypts a plain text;

FIG. 3 is a diagram illustrating an example implementation of the RC4 ciphering process;

FIG. 4 is a table illustrating the pipelining of the RC4 ciphering process, according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating another example implementation of the RC4 ciphering process, according to an embodiment of the present invention;

FIG. 6 is a table illustrating the pipelining of the RC4 ciphering process, according to another embodiment of the present invention; and

FIG. 7 is a diagram of a network system.

DETAILED DESCRIPTION

An embodiment of the present invention comprises an apparatus and method for performing RC4 ciphering. The RC4 ciphering process operates by producing a series of bytes, called a key stream. A plain text is encrypted by XORing each byte of the plain text with each byte of the key stream to produce a cipher text. Decryption is performed by producing the same key stream and XORing it with the cipher text to reproduce the plain text. The RC4 ciphering process maintains an internal state in the form of a 256 byte state array (called the S-box) and two index variables i and j. The initial values of the S-box are generated from the shared secret key (both receiver and transmitter must have the same key). The key stream is produced by manipulating the values of the S-box and i and j variables. In a typical hardware implementation of the RC4 ciphering process, the S-box is stored in a register file. Because the production of one byte of the key stream involves three read operations and two write operations with an S-box storage device, a register file with three read ports and two write ports (“3-read/2-write register file”) is thus desirable for the S-box. Using a 3-read/2-write register file to store the S-box along with the pipelining technology, the key-stream production may achieve a throughput of one byte per clock cycle. However, 3-read/2-write register files are not commonly available because they are complex and more expensive compared to register files with less read/write ports. According to an embodiment of the present invention, a cache may be used to store a number of bytes read from the S-box storage device. This way, the number of read operations from the S-box storage device required per clock cycle may be reduced so that a register file with two read ports and one write port (“2-read/1-write register file”) may be used to store the S-box. A 2-read/1-write register file is more common and less expensive than a 3-read/2-write register file. In one embodiment, using a cache and a 2-read/1-write register file along with the pipelining technology, the RC4 ciphering process may be implemented more efficiently in hardware than using a 3-read/2-write register file, without significantly sacrificing the throughput of the key stream generation processing.

Reference in the specification to “one embodiment” or “an embodiment” of the present disclosure means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of the phrase “in one embodiment” appearing in various places throughout the specification are not necessarily all referring to the same embodiment.

FIG. 1 depicts a general network system 110 that supports a number of terminals. The network system 110 may comprise a number of network devices such as routers, switches, and bridges to facilitate data passage from one terminal to another. The network system may be a wireless system, an Ethernet system, any other systems, or a combination of different network systems. The network system may employ a satellite 120 to help connect one terminal to another terminal. The terminals of the network system may comprise servers (130), desktop computers (140), personal directory assistants (PDAs) (150), cell phones (160), laptop computers (170), or other devices. Data communicated among different terminals may include video, audio, messages, and other data. The network system may use the WEP, SSL, or other standard for communication security. As a component of the WEP, SSL, or other standards, the RC4 ciphering process may be employed to encrypt data to ensure confidential communication and the integrity of communications.

FIG. 2 shows a pseudo code illustrating how a plain text is encrypted using the RC4 ciphering process. The RC4 ciphering process generates a key stream byte by byte based on the S-box. The initial values of the S-box are generated from the shared secret key. In the pseudo code as shown in FIG. 2, S[i] represents the value of the i^(th) byte of the S-box, and S[j] represents the value of the j^(th) byte of the S-box. K represents one byte of the generated key stream, Ptext is one byte of the plain text and Ctext is one byte of the cipher text. In line 1, the index variables i and j are initialized to 0. Line 2 starts a loop that encrypts the plain text byte by byte. In line 3, the value of index variable i is updated by increasing its previous value by 1. In line 4, the i^(th) byte of the S-box is read from the S-box storage device; and the value of index variable j is updated based on its previous value and the value of the i^(th) byte of the S-box, S[i]. In line 5, the value of the i^(th) byte and the value of the j^(th) byte of the S-box are swapped, which involves reading the j^(th) byte of the S-box from the S-box storage device and writing S[i] and S[j] back to the S-box storage device as the j^(th) byte and i^(th) byte, respectively. In line 6, a third index t is obtained by adding S[i] and S[j] together. Because the S-box has a total of 256 bytes, the value of any index, i, j, or t, must be between 0 and 255 (including 0 and 255). This explains why “mod 256” operation is needed in obtaining i, j, or k in lines 3, 4, and 6. In line 7, the t^(th) byte of the S-box is read from the S-box storage device and is used as the current byte, K, of the generated key stream. In line 8, a cipher byte, Ctext, is generated for the current byte of the plain text, Ptext, by XORing K with Ptext. The operations between line 2 and line 8 are iterated until all bytes in the plain text are encrypted. The encryption loop ends in line 9 when all bytes in the plain text are processed. The decryption process is identical to the encryption process except that described in line 8, that is, Ctext is XORed with K to generate Ptext.

FIG. 3 depicts an example implementation of the encryption process using the RC4 ciphering process, according to an embodiment of the present invention. The S-box storage device 310 stores a 256 byte S-box. In one embodiment, the S-box storage device may be a register file. In another embodiment, the S-box storage may consist of flip-flops. When flip-flops are used, it may take approximately 2048 flops to store the 256 S-box. Implementation of 2048 flops may require a larger die area than a register file. The key stream generator 320 generates a key stream byte by byte by performing operations as illustrated from line 3 to line 7 of FIG. 2. The operations may comprise increasing a first index variable i, reading the i^(th) byte of the S-box from the S-box storage device, calculating a second index variable j, swapping the i^(th) byte and j^(th) byte of the S-box in the S-box storage device, calculating a third index t, and reading the t^(th) byte of the S-box from the S-box storage device. The controller 330 controls the S-box storage device and the key stream generator to produce the key stream, one byte at a time. The controller may instruct the S-box storage device when to make a particular byte available for read/write. The S-box storage device may also inform the controller when a particular byte is ready for read/write operations so that the controller may direct the key stream generator to perform a read/write operation from/to the S-box storage device. Additionally, the controller may help solve conflicts or correct errors for any read/write operations. The XOR mechanism 340 performs an XOR operation between the input data 342 and the key stream 344 to produce encrypted data 346 from the input data. The XOR mechanism performs the XOR operation one byte at a time under the direction of the controller. Thus, the output encrypted data 346 is produced byte by byte.

In many applications, it is desirable to have a faster encryption speed because a slower encryption speed causes longer delays and limits the bandwidth of network communications. To improve the encryption speed by the RC4 ciphering process, the key stream generation process may be pipelined in one embodiment. A key stream generator using the pipelining technology may produce one byte of the key stream per clock cycle, that is, the key stream generator may achieve a throughput of one byte per cycle. FIG. 4 illustrates how a five-stage pipeline may be used to generate a key stream for the RC4 ciphering process. In order to pipeline the RC4 ciphering, the key generator and the controller (as shown in FIG. 3) may need to be modified to support the pipelining scheme. In FIG. 4, “Raddr n_(m)” denotes an operation of reading the n_(m) ^(th) byte of the S-box from the S-box storage device for generating the m^(th) byte of the key stream, K_(m). Similarly, “Waddr n_(m) ^(th)” denotes an operation of writing to the S-box storage device at the n_(m) ^(th) byte of the S-box for generating the m^(th) byte of the key stream, K_(m). For convenience, FIG. 4 does not show any addition operation that comes with each read operation.

Generation of the first byte of the key stream, K₁, starts at cycle 1, during which the value of index variable i₁ is obtained by increasing the initial value i₀ by 1, and the i₁ ^(th) byte of the S-box, S[i₁], is read from the S-box storage device. In cycle 2, the value of index variable j₁ is calculated by adding the previous value of j (i.e., initial value of j, which is j₀) and S[i₁] (i.e., j₁=(j₀+S[i₁]) mod 256), and subsequently the j₁ ^(th) byte of the S-box, S[i₁], is read from the S-box storage device. Additionally, generation of K₂ starts at cycle 2, during which i₂ is obtained by increasing i₁ by 1 and S[i₂] is read from the S-box storage device. In cycle 3, S[j₁] is written back to the S-box storage device to replace the i₁ ^(th) byte of the S-box; j₂ is obtained (j₂=(j₁+S[i₂]) mod 256) and S[j₂] is read from the S-box storage device; and generation of K₃ starts with obtaining i₃ (=(i₂+1) mode 256) and reading S[i₃] from the S-box storage device. In cycle 4, S[i₁] is written back to the S-box storage device to replace the j₁ ^(th) byte of the S-box; S[j₂] is written back to the S-box storage device to replace the i₂ ^(th) byte of the S-box; j₃ is obtained (j₃=2+S[i₃]) mod 256) and S[j₂] is read from the S-box storage device; and generation of K₄ starts with obtaining i₄ (=(i₃+1) mod 256) and reading S[i₄] from the S-box storage device. In cycle 5, the value of index variable t₁ is obtained (t₁=(S[i₁]+S[j₁]) mod 256) and S[t₁] is read from the S-box storage device; S[i₂] is written back to the S-box storage device to replace the j₂ ^(th) byte of the S-box; S[j₃] is written back to the S-box storage device to replace the j₃ ^(th) byte of the S-box; j₄ is obtained (j₄=(j₃+S[i₄]) mod 256) and S[j₄] is read from the S-box storage device; and generation of K₅ starts with obtaining i₅ (=(i₃+1) mod 256) and reading S[i₅] from the S-box storage device. By the end of cycle 5, the first byte of the key stream, K₁ (=S[t₁]), is generated, which may be used to encrypt the first byte of the input data in cycle 6. From cycle 5 forward (including cycle 5), three read operations and two write operations are performed simultaneously with each cycle. Additionally, a byte of the key stream is generated at the end of each cycle from cycle 5 going forward. When the key stream to be generated contains many bytes, the throughput of the key stream generation process is approximately a byte per cycle (there is no byte of the key stream generated within the first 4 cycles).

Using a five stage pipeline, as shown in FIG. 4, an implementation of the RC4 ciphering process can achieve a throughput of one byte per cycle for the key stream generator. Such an approach requires three read operations and two write operations per cycle. Typically, the S-box for the RC4 algorithm is stored in a register file. Using a five stage pipeline as illustrated in FIG. 4 would require a register file with three read ports and two write ports (“3-read/2-write register file”). 3-read/2-write register files are not commonly available because they are complex and consume a large amount of die area compared to register files with less read/write ports. 3-read/2-write register files may even limit the frequency at which the RC4 ciphering process can operate. An alternative approach is to pipeline the key stream generation for the RC4 ciphering process but using a 2-read/1-write register file to store the S-box. Such an alternative approach, however, may reduce the throughput of the key stream generation significantly.

It is noted from FIG. 2 that the value of index variable i is incremented after each iteration, which means that S[i] is read and written from/to successive locations (bytes) in the S-box storage on successive iterations. Also cipher operations using the RC4 algorithm are typically performed on multi-byte messages, and hence multiple iterations of the algorithm as shown in FIG. 2 are performed to generate a key stream with multiple bytes. Therefore, several bytes of the S-box may be read from a register file at a time and stored in a cache. The number of bytes that are read from the register file at a time may vary (for example, it could be 4 or 8). The cache may be made of flip-flops. For the convenience of description, assume that the size of the cache is 8 bytes, that is, 8 bytes of the S-box may be read from the S-box storage device at a time. This would allow the key stream generator to read/write S[i] from/to the cache for the next 8 iterations (as shown in FIG. 2) to generate 8 bytes of key stream. At the end of the 8^(th) iteration, the 8 bytes in the cache may be written back to the register file in one operation. Using this approach, a 2-read/1-write register file may be used to store the S-box and the RC4 algorithm may still be able to achieve a throughput which is close to one byte per cycle for key stream generation.

FIG. 5 depicts an embodiment of the implementation of the RC4 ciphering process using a 2-read/1-write register file along with an N-byte cache (N can be any number). The S-box storage device 510 may be a 2-read/1-write register file. The S-box storage device has two read ports 512 and 514 and one write port 516. The addresses for read ports 512, 514, and write port 516 are provided by the controller 540 through lines 502, 504, and 506, respectively. The cache write mechanism 520 may read N bytes from the S-box storage device and write them to the cache 530. In one embodiment, the cache write mechanism may be integrated into the cache. The cache stores N bytes of the S-box read from the S-box storage device for use of generating N bytes of a key stream. S[i] is read from and written to successive locations in the S-box on successive iterations. Because of this predictability of S[i] among successive iterations of the RC4 ciphering process, N successive bytes of the S-box may be read from the S-box storage device and stored in the cache 530 so that S[i] may be read from and written to the cache directly for the next N iterations of the RC4 ciphering process.

The key generator 550 in FIG. 5 generates a byte of a key stream at the end of each iteration. The controller 540 controls the S-box storage device, the cache write mechanism, the cache, the key generator, and the XOR mechanism 590 so that they can work together to produce the encrypted output data for the input data byte by byte. The key generator 550 comprises a first selector 555, a second selector 560, a third selector 565, a first adder 570, a second adder 575, a swapping mechanism 585, and a scheduler 580. The first selector 555 selects an S[i] among N S-box bytes (532) from the cache 530 for the current iteration of the RC4 ciphering process, under the direction from the controller. The first adder 570 adds the value of j from the preceding iteration (584) and the value of S[i] (562) selected by the first selector to produce the value of j for the current iteration (588), under the direction from the controller. The scheduler 580 may initialize values of i and j to 0 before the first iteration; provide the first selector the value of i for the current iteration (582) by incrementing the value of i from the preceding iteration; and provide the first adder the value of j from the preceding iteration (584), under the direction of the controller. Additionally, the value of j for the current iteration (588), produced by the first adder, is sent to the scheduler so that the scheduler may request the controller 540 to obtain S[j] for the current iteration. It is possible that S[j] is currently in the cache. Because N bytes of S-box stored in the cache represent the most updated version of these bytes, the controller will direct that S[j] be read from the cache instead of the S-box storage device, if both the cache and the S-box storage device contain S[j]. The second selector 560 selects one byte as S[j] among the bytes stored in the cache and the byte read from the S-box storage device, if there is any, under the control of the controller.

S[j] (562), selected by the second selector, is subsequently swapped with S[i] (552), selected by the first selector, by the swapping mechanism 585 under the control of the controller. During the swapping process, S[j] (shown as 564 in FIG. 5) is written to the cache 530 to replace S[i] through the cache write mechanism 520. In the meanwhile, S[i] (shown as 554 in FIG. 5) is written back to the S-box storage device if S[j] used for the current iteration is from the S-box storage device to replace S[j]. If S[j] was read from the cache, on the other hand, S[i] is written back to the cache to replace S[j]. Furthermore, values of S[i] and S[j], selected by the first and second selectors, respectively, are added together to produce a value for a third index variable t (586). The value of t (586) is sent to the scheduler 580 so that the scheduler may request the controller to obtain S[t] for the current iteration. It is possible that S[t] is currently in the cache. Because N bytes of S-box stored in the cache represent the most updated version of these bytes (bytes in the cache are updated during the process of key stream generation), the controller will direct that S[t] be read from the cache instead of the S-box storage device, if both the cache and the S-box storage device contain S[t]. The third selector 565 selects one byte as S[t] among the bytes stored in the cache and the byte read from the S-box storage device, if there is any, under the control of the controller. S[t] (592), selected by the third selector, is the byte of the key stream, K, for the current iteration. Subsequently, K is XORed with the corresponding byte of the input data 594 by the XOR mechanism 590 to produce the encrypted byte 596 for the byte of the input data.

Because values of all S-box index variables, i, j, and t, are between 0 and 255, including 0 and 255 (assume that the S-box has a total of 256 bytes; and if the S-box has a total of M bytes, the value of j should be between 0 and M-1), the increment operation performed by the scheduler, and the addition operations performed by the first and second adders are all modulo 256 (“mod 256”) or “mod M” operations. After N iterations, N bytes in the cache may be written back to the S-box. Note that values of the N bytes written back may be different from values of N bytes originally read from the S-box storage device before N iterations were started.

FIG. 6 illustrates how a three-stage pipeline may be used to generate a key stream for the embodiment shown in FIG. 5. To pipeline key stream generation, the controller 540 and the key generator 550 (especially the scheduler 580), as shown in FIG. 5, may need to be modified so that each iteration may start at each clock cycle, when possible. In FIG. 6, “Raddr n_(m)” denotes an operation of reading the n_(m) ^(th) byte of the S-box from the S-box storage device for generating the m^(th) byte of the key stream, K_(m). “Waddr n_(m)” denotes an operation of writing to the S-box storage device at the n_(m) ^(th) byte of the S-box for generating the m^(th) byte of the key stream, K_(m). “Raddr n_(m)-n_(i)” denotes an operation of reading N consecutive bytes of the S-box, starting from the n_(m) ^(th) byte, from the S-box storage device to the cache for generating the m^(th) through the I^(th) bytes of the key stream, K_(m) through K_(I) (where N is the size of the cache and I=m+N−1). “Waddr n_(m)-n_(I)” denotes an operation of writing N bytes from the cache to the S-box storage device to replace N consecutive bytes originally read from the S-box storage device for generating K_(m) through K_(I) (where N is the size of the cache and I=m+N−1). For convenience, FIG. 6 does not show any addition operation that comes along with each read operation. The size of the cache is typically the same as the line size of a register file, which may be 4 bytes, 8 bytes, 16 bytes, or any other numbers. FIG. 6 assumes that the size of the cache is 8 bytes so that 8 consecutive bytes from the S-box storage device may be read at a time and stored in the cache.

As shown in FIG. 6, in cycle 1, the value of index variable i₁ is obtained by increasing the initial value i₀ by 1; and 8 consecutive bytes in the S-box storage device, starting from the i₁ ^(th) byte, are read to the cache. In cycle 2, the value of index variable j₁ is calculated by adding the previous value of j (i.e., initial value of j, which is j₀) and S[i₁] (i.e., j₁=(j₀+S[i₁]) mod 256), and subsequently the j₁ ^(th) byte of the S-box, S[j₁], is read from the cache if it is in the cache, and otherwise from the S-box storage device. In cycle 3, S[j₁] is written to the cache to replace the i₁ ^(th) byte of the S-box, and S[i₁] is written to the cache to replace S[j₁] if S[j₁] was read from the cache in cycle 2, and otherwise, S[i₁] is written back to the S-box storage device to replace S[j₁]; j₂ is obtained (j₂=(j₁+S[i₂]) mod 256) and S[j₂] is read from the cache if it is in the cache, and otherwise from the S-box storage device; and the value of index variable t₁ is obtained (t₁=(S[i₁]+S[j₁]) mod 256) and S[t₁] is read from the cache if it is in the cache, and otherwise from the S-box storage device. By the end of cycle 3, the first byte of the key stream, K₁ (=S[t₁]), is generated, which may be used to encrypt the first byte of the input data. From cycle 3 to cycle 10 (including cycles 3 and 10), two read operations and one write operation are performed simultaneously with each cycle; and a byte of the key stream is generated at the end of each cycle.

After cycle 9, all 8 bytes in the cache, which were read from the S-box storage device in cycle 1, have been read and used to generate the value for index variable j. In cycle 10, another 8 consecutive bytes in the S-box, starting from the i_(g) ^(th) byte (i₉=i₈+1=i₁+8), are read from the S-box storage device to the cache through the cache write mechanism, which may temporarily hold these newly-read 8 bytes before bytes currently in the cache are written back to the S-box storage device. Also in cycle 10, S[i₈] is written to the cache to replace the i₈ ^(th) byte of the S-box, and S[i₈] is written to the cache to replace S[j₈] if S[j₈] was read from the cache in cycle 9, and otherwise, S[i₈] is written back to the S-box storage device to replace S[j₈] there; and t₈ is obtained (t₈=(S[i₈]+S[j₈]) mod 256) and S[t₈] is read from the cache if it is in the cache, and otherwise from the S-box storage device. In cycle 11, 8 bytes currently in the cache may be written back to the S-box storage device to replace bytes from the i₁ ^(th) to the i₈ ^(th) (bytes being written back may not be the same as those bytes originally read from the same place in the S-box storage device because of swapping operations). Subsequently, the newly-read 8 consecutive bytes in cycle 9 may be moved from the cache write mechanism to the cache. Also in cycle 11, j₉ is obtained (j₉=(j₈+S[i₉]) mod 256, the value of S[i₉] may be obtained when 8 bytes were read from the S-box storage device in cycle 9) and S[j₂] is read from the S-box storage device. In the next 8 cycles starting from cycle 12, two read operations and one write operation will be performed and one byte of the key stream may be generated in each cycle. Overall, using a three-stage pipeline as illustrated in FIG. 6, an embodiment of this invention may achieve a throughput of 8 bytes every 9 cycles for key stream generation. This throughput is close to one byte per cycle.

FIG. 7 depicts a network system that can perform RC4 ciphering. The system may comprise a collection of line cards 720 (“blades”) interconnected by a switch fabric 710 (e.g., a crossbar or shared memory switch fabric). Individual line cards may be located in the same physical location or different physical locations (e.g., different cities). The switch fabric, for example, may conform to Common Switch Interface (CSIX) or other fabric technologies such as HyperTransport, Infiniband, Peripheral Component Interconnect (PCI), Packet-Over-SONET (Synchronous Optic Network), RapidIO, and/or UTOPIA (Universal Test and Operations PHY (Physical Layer) Interface for ATM).

Individual line cards (e.g., 720A) may include one or more physical layer (PHY) devices 722 (e.g., optic, wire, and wireless PHYs) that handle communication over network connections. The PHYs translate between the physical signals carried by different network mediums and the bits (e.g., “0”-s and “1”-s) used by digital systems. The line cards 720 may also include framer devices (e.g., Ethernet, Synchronous Optic Network (SONET), High-Level Data Link (HDLC) framers or other “layer 2” devices) 724 that can perform operations on frames such as error detection and/or correction. The line cards 720 shown may also include one or more network processors 726 that perform packet processing operations for packets received via the PHY(s) 722 and direct the packets, via the switch fabric 710, to a line card providing an egress interface to forward the packet. Potentially, the network processor(s) 726 may perform “layer 2” duties instead of the framer devices 724.

The network processor(s) 726 may be an Intel® Internet eXchange network Processor (IXP) or other network processors featuring different designs. The network processor features a collection of packet processing engines on a single integrated circuit. Individual engines may provide multiple threads of execution. Additionally, the network processor includes a core processor (that is often programmed to perform “control plane” tasks involved in network operations. The core processor, however, may also handle “data plane” tasks. The network processor 726 also features at least one interface that can carry packets between the processor and other network components. For example, the processor can feature a switch fabric interface 710 that enables the processor 726 to transmit a packet to other processor(s) or circuitry connected to the fabric. The processor 726 can also feature an interface that enables the processor to communicate with physical layer (PHY) and/or link layer devices (e.g., MAC or framer devices). The processor 726 also includes an interface (e.g., a Peripheral Component Interconnect (PCI) bus interface) for communicating, for example, with a host or other network processors. Moreover, the processor 726 also includes other components shared by the engines such as memory controllers a hash engine, and internal scratchpad memory.

As shown in FIG. 7, each line card 720 may be operably coupled with at least one RC4 module 730 (e.g., 730A) that performs RC4 ciphering. In one embodiment, the RC4 module may be separate from the line card. In another embodiment, the RC4 module may be integrated with the line card. Also in one embodiment, the RC4 module may be a part of the network processor 726 or a part of the PHY 722. Yet in another embodiment, the RC4 module may be located in other network layers such as link layer, network layer, and/or application layer.

Although an example embodiment of the present disclosure is described with reference to diagrams in FIGS. 1-7, persons of ordinary skill in the art will readily appreciate that many other methods of implementing the present disclosure may alternatively be used. For example, the order of execution of the functional blocks or process procedures may be changed, and/or some of the functional blocks or process procedures described may be changed, eliminated, or combined.

In the preceding description, various aspects of the present disclosure have been described. For purposes of explanation, specific numbers, systems and configurations were set forth in order to provide a thorough understanding of the present disclosure. However, it is apparent to one skilled in the art having the benefit of this disclosure that the present disclosure may be practiced without the specific details. In other instances, well-known features, components, or modules were omitted, simplified, combined, or split in order not to obscure the present disclosure.

Embodiments of the present disclosure described herein may be implemented in circuitry, which includes hardwired circuitry, digital circuitry, analog circuitry, programmable circuitry, and so forth. They may also be implemented in computer programs. Such computer programs may be coded in a high level procedural or object oriented programming language. However, the program(s) can be implemented in assembly or machine language if desired. The language may be compiled or interpreted. Additionally, these techniques may be used in a wide variety of networking environments. Such computer programs may be stored on a storage media or device (e.g., hard disk drive, floppy disk drive, read only memory (ROM), CD-ROM device, flash memory device, digital versatile disk (DVD), or other storage device) readable by a general or special purpose programmable processing system, for configuring and operating the processing system when the storage media or device is read by the processing system to perform the procedures described herein. Embodiments of the disclosure may also be considered to be implemented as a machine-readable storage medium, configured for use with a processing system, where the storage medium so configured causes the processing system to operate in a specific and predefined manner to perform the functions described herein.

While this disclosure has been described with reference to illustrative embodiments, this description is not intended to be construed in a limiting sense. Various modifications of the illustrative embodiments, as well as other embodiments of the disclosure, which are apparent to persons skilled in the art to which the disclosure pertains are deemed to lie within the spirit and scope of the disclosure. 

1. An apparatus for performing RC4 ciphering, comprising: a byte array storage device to store an array of bytes; a cache to store a plurality of bytes of the array of bytes read from the byte array storage device; and a key generator to generate a key stream for use in ciphering input data, based on data stored in at least one of the byte array storage device and the cache.
 2. The apparatus of claim 1, wherein initial values of the array of bytes are generated from a secret key shared by a receiver and a transmitter in a network system.
 3. The apparatus of claim 1, wherein the byte array storage device comprises a 2-read and 1-write register file.
 4. The apparatus of claim 1, further comprising a controller to control the byte array storage device, the cache, and the key generator to pipeline generation of the key stream.
 5. The apparatus of claim 1, further comprising an XOR mechanism to XOR the input data with the key stream byte by byte to generate an encrypted version for the input data, under the control of a controller.
 6. The apparatus of claim 1, wherein the key generator generates the key stream byte by byte under the control of a controller.
 7. The apparatus of claim 4, wherein the controller detects and resolves collisions between the byte array storage device and the cache during read/write operations.
 8. The apparatus of claim 1, wherein the key generator comprises: a first selector to select a first byte in the array of bytes from the cache based on a first index; a first adder to obtain a second index, based at least in part on the value of the first byte; a second selector to select a second byte in the array of bytes from at least one of the byte array storage device and the cache, based on the second index; a swapping mechanism to swap values of the first byte and the second byte; a second adder to obtain a third index, based at least in part on values of the first byte and the second byte; and a third selector to select a third byte in the array of bytes from at least one of the byte array storage device and the cache, based on the third index.
 9. The apparatus of claim 8, further comprising a scheduler to provide initial values for the first index and the second index, to perform increment operations for the first index, and to provide one input data for the first adder.
 10. The apparatus of claim 9, wherein the scheduler coordinates with the controller to provide read/write addresses for at least one of the byte array storage device and the cache.
 11. The apparatus of claim 9, wherein the scheduler coordinates with the controller to pipeline key stream generation.
 12. The apparatus of claim 1, further comprising a cache write mechanism to facilitate writing bytes from the byte array storage device to the cache.
 13. A method for performing RC4 ciphering, comprising: receiving input data; reading a plurality of bytes in an array of bytes from a byte array storage device to a cache; initializing a first index and a second index for use in accessing bytes of the array of bytes stored in at least one of the byte array storage device and the cache; and for each byte of the input data: generating a byte of a key stream, and producing an encrypted byte for the byte of the input data based at least in part on the byte of the key stream.
 14. The method of claim 13, wherein process for each byte of the input data is pipelined so that a second iteration can start before a first iteration is completed.
 15. The method of claim 13, wherein generating the byte of the key stream comprises: updating the first index; reading a first byte in the array of bytes from the cache based on the first index; updating the second index based at least in part on the first byte; reading a second byte in the array of bytes from at least one of the byte array storage device and the cache, based on the second index; swapping the first byte and the second byte; calculating a third index based on the first byte and the second byte; and reading a third byte in the array of bytes from at least one of the byte array storage device and the cache, based on the third index.
 16. The method of claim 15, wherein updating the first index comprises incrementing a value of the first index during a preceding iteration, modulo the number of bytes in the array of bytes;
 17. The method of claim 15, wherein updating the second index comprises adding a value of the second index during a preceding iteration and a value of the first byte, modulo the number of bytes in the array of bytes.
 18. The method of claim 15, wherein calculating the third index comprises adding a value of the first byte and a value of the second byte, modulo the number of bytes in the array of bytes.
 19. The method of claim 15, wherein swapping the first byte and the second byte comprising writing the second byte to replace the first byte in the cache and writing the first byte to replace the second byte in at least one of the byte array storage device and the cache.
 20. The method of claim 13, wherein producing the encrypted byte comprises performing an XOR operation between the third byte and the byte of the input data.
 21. The method of claim 13, further comprising reading another plurality of bytes in the array of bytes from the byte array storage device after the plurality of bytes in the cache have been used, and continuing to process for each remaining byte of the input data.
 22. The method of claim 13, further comprising writing bytes in the cache back to the byte array storage device after the bytes in the cache have been used in updating values of the second index.
 23. The method of claim 13, further comprising detecting and resolving collisions between the byte array storage device and the cache in at least one of a reading operation and a writing operation.
 24. A network system, comprising: a switch fabric; a plurality of line cards interconnected by the switch fabric; and a plurality of RC4 modules, each operably coupled with a line card to perform RC4 ciphering, an RC4 module including: a byte array storage device to store an array of bytes, a cache to store a plurality of bytes of the array of bytes read from the byte array storage device, and a key generator to generate a key stream for use in ciphering input data, based on data stored in at least one of the byte array storage device and the cache.
 25. The network system claim 24, wherein initial values of the array of bytes are generated from a secret key shared by a receiver and a transmitter in the network system.
 26. The network system of claim 24, wherein the byte array storage device comprises a 2-read and 1-write register file.
 27. The network system of claim 24, further comprising a controller to control the byte array storage device, the cache, and the key generator to pipeline generation of the key stream.
 28. The network system of claim 24, further comprising an XOR mechanism to XOR the input data with the key stream byte by byte to generate an encrypted version for the input data, under the control of a controller.
 29. The network system of claim 24, wherein the key generator generates the key stream byte by byte under the control of a controller.
 30. The network system of claim 27, wherein the controller detects and resolves collisions between the byte array storage device and the cache during read/write operations.
 31. The network system of claim 24, wherein the key generator comprises: a first selector to select a first byte in the array of bytes from the cache based on a first index; a first adder to obtain a second index, based at least in part on the value of the first byte; a second selector to select a second byte in the array of bytes from at least one of the byte array storage device and the cache, based on the second index; a swapping mechanism to swap values of the first byte and the second byte; a second adder to obtain a third index, based at least in part on values of the first byte and the second byte; and a third selector to select a third byte in the array of bytes from at least one of the byte array storage device and the cache, based on the third index.
 32. The network system of claim 31, further comprising a scheduler to provide initial values for the first index and the second index, to perform increment operations for the first index, and to provide one input data for the first adder.
 33. The network system of claim 32, wherein the scheduler coordinates with the controller to provide read/write addresses for at least one of the byte array storage device and the cache.
 34. The network system of claim 32, wherein the scheduler coordinates with the controller to pipeline key stream generation.
 35. The network system of claim 24, further comprising a cache write mechanism to facilitate writing bytes from the byte array storage device to the cache. 